Goto

Collaborating Authors

 data retrieval pipeline


Data Retrieval pipeline at source{d}

#artificialintelligence

Data collection and processing might be less sexy than Machine Learning but nevertheless is crucial for any progress, and it is also something that source{d} as a company was built upon and has invested a lot into. It was briefly highlighted at several conference talks (go-git, gitbase, gitbase indexes). Now is time for a full-length blog post with the details. Before we begin a small reminder: as with most of what we do at source{d}, all the tools described in this blog post are available as an Open Source software and packaged in source{d}, our end user product. Most of the recent progress on ML and Deep Learning, in particular, is attributed to the fact of having an abundance of data and plenty of computing resources to use for training large Neural Network models.